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Abstract 

We consider the problem of image reconstruction from a finite number of pro- 
jections over the space L 1 (Cl), where 0 is a compact subset of R 2 . We prove that, 
given a discretization of the projection space, the function that generates the cor- 
rect projection data and maximizes the Boltzmann-Shannon entropy is piecewise 
constant on a certain discretization of Q, which we call the “optimal grid”. It is on 
this grid that one obtains the maximum resolution given the problem setup. The 
size of this grid grows very quickly as the number of projections and number of cells 
per projection grow, indicating fast computational methods are essential to make 
its use feasible. 

We use a Fenchel duality formulation of the problem to keep the number of 
variables small while still using the optimal discretization, and propose a multilevel 
scheme to improve convergence of a simple cyclic maximization scheme applied to 
the dual problem. 
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1 Introduction 


In computerized tomography (CT), one encounters the problem of reconstructing an im- 
age, or a density, defined by the function x(s,t), given only a finite number of projection 
data. General references include [1] and [2], 

The projection data is typically of the form 

b™ = J x(s,t)dsdt (1.1) 

where the support of x is assumed to lie in the bounded region fl C R 2 and Q.™ is the 
fc th strip orthogonal to the m th projection (see Figure 1). We assume that there are M 
projections and that the m th projection has K m cells. Let b be the vector of projection 
data, N the length of b, and ip™ the characteristic function of Q™. We then rewrite (1.1) 
as 

b = Ax , (1.2) 

where A : L 1 (fl) — ► R N via 

(Ax) kiTn = f x(s,t)ip^(s,t)dsdt. (1.3) 

J fi 

The reconstruction problem we study is: given the projection data 6, find a density 
function x such that Ax = b. Since A has an infinite-dimensional kernel, solutions, if they 
exist, are not unique. The problem then becomes: find the “best” function x Q such that 
Ax 0 = b. The concept of “best” is ambiguous to be sure, but some criteria have been 
gaining acceptance. In this paper we choose to study the solution with maximum entropy 
as defined by Shannon [3] in information theory. For an informal discussion of entropy 
and information theory, see for example [4], For a discussion of maximum entropy in 
image reconstruction, see [5]. 

In our context, we wish to find the function x 0 € L l (Q) such that x 0 attains 

sup I -L x(s, t) ln[x(s, t)]dsdt : Ax = fcj . (1.4) 


This is the maximum entropy solution to Ax — b. Simply because we would rather 
minimize a convex function than maximize a concave function, we rewrite this as a convex 
minimization program via 


p = inf [J x(s, t) ln[x(s, <)]ds dt : Ax = 6 j . 


(1.5) 


This is called the primal problem. If we further define the function ef> : R 
by 


<P( U ) = 



win u u > 0 
u = 0 


u < 0 


( — oo, +oo] 
( 1 . 6 ) 
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then we can rewrite (1.5) as 

p = inf|y 4>{x(s, t))dsdt : Ax = , (1.7) 

which is in a form that has been receiving much attention in the optimization community 
of late. In particular, see Borwein and Lewis [6]. One of the features of this functional is 
that it forces feasible functions to be nonnegative, as a density or image function should 
be. We shall see that it has other properties that are computationally and theoretically 
attractive, one of the most important being that solutions exist in L l (Q.) under rather 
mild conditions. 

In this paper we shall characterize solutions to (1.5) given some reasonable conditions 
on the data, b, and show that the solution in L l (Q) for the CT case is piecewise con- 
stant, but usually not on a rectangular grid. While this result seems to be known in 
the tomography community, it is rare that one finds a mathematically sound derivation 
of the solution. The first half of this paper discusses the difficulties in addressing this 
problem and references the literature to outline a correct proof of our characterization. 
Note that we do not initially impose a discretization of L j (Q) or Q, only of the data. The 
discretization we shall use arises as a consequence of the form of the functions ip™ . 

The second half of this paper discusses implementation details. It will turn out that 
the appropriate grid for optimal resolution (the “optimal grid”) is very large compared 
to the amount of data, N, one has. We shall also see that finding the optimal function 
can be reduced to solving a problem in 1R N , but the intermediate calculations require use 
of the (large) optimal grid. We have found that a simple cyclic coordinate maximization 
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scheme applied to the Fenchel dual of (1.5), while convergent, tends to stall after a few 
iterations. We describe a multigrid approach to accelerate convergence. 

2 Maximum Entropy Solutions 

2.1 The Entropy Functional 

Using the entropy functional 

/ : L^fl) — » (— oo,+oo] : x ► — f <j>(x(s,t))dsdt (2.1) 

j n 

to pick a “best” density function x has been popularized by Shannon in information 
theory, but also arises in the context of thermodynamics with Boltzmann. For this reason 
we call this the Boltzmann-Shannon entropy. 

Entropy is, in short, the expected amount of information present in a probability 
density x. In our context, one can think of an image (appropriately scaled) as a probability 
density function, and computing the feasible density with maximum entropy yields the 
density carrying the most information. The standard reference is Shannon and Weaver [3], 
but many basic probability books contain some discussion of information and entropy (see 
[4, 7] for example). References that deal specifically with entropies in image reconstruction 
are [5, 8]. 

We remark that the theory and methods developed here apply directly to other objec- 
tive functionals, in particular, minimum L 2 -norm solutions. In fact, with the minimum 
£ 2 -norm functional, our iterative method is essentially only changed by replacing * by + 
and / by — . 

2.2 Existence of Solutions 

In this section we prove that solutions to (1.5) exist. Usually, this point is ignored, but is 

nevertheless an important issue. = ••>.« . .._ .... 

Throughout, let X be a linear normed space with topology r. We begin with some 
definitions. : 

Definition 2.1 We say a set K Q X is T-sequentially compact if every sequence from 
K h as a r-convergent subsequence. 

Definition 2.2 Given a function f : X — > (— oo, +oo] . for a £ Ft. we define the lower 
level sets L a of f to be 

L a = {x: f(x) < a}. (2.2) 

The following is a general existence theorem for solutions to constrained optimization 
problems. 


t 
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Theorem 2.3 Let f be r-lower scmicoul unions (lac) possessing t - sequentially compact 
lower level sets , and let C be a T-closed set. Then 

p = inf {/(x) : x € C} (2.3) 

is attained for some Xq 6 C . 

Proof: Let {x n } C C be a sequence such that f{x n ) — * p. Letting a = p+ 1, eventually 

all x n e L q , for n > N. By r-sequential compactness of L a there is a subsequence x Hk 

that converges to a point xo G L a . Since C is T-closed, then xo G C‘, f is r-lsc, so 

P - lim f(x n J > f (x 0 ) > p (2.4) 

K— OO 

and, thus, p = f(x o). ■ 

We are interested in the case where the r-topology is the weak topology on L 1 (fl). 
Let C = {x : Ax = 6} for A : X — > 1R. N linear and continuous. Then C = ^4 _1 ({6}) 
is closed (since A is continuous) and convex (since A is linear) and, hence, weakly closed. 
This is called Mazur’s theorem; see [9, Corollary 4 Chapter 2], for example. 

We now direct our attention to the functional 

f(x)= [ x(s 1 t)\n[x(s,t)]dsdt. (2.5) 

Jn 

From [10, Theorem 2.2], we see that / is weakly Isc with weakly compact lower level 
sets, provided Q is of finite measure. To apply Theorem 2.3, we need weakly sequentially 
compact lower level sets. The Eberlein-Smulian theorem [9] states that a subset of a 
Banach space is weakly compact if and only if it is weakly sequentially compact, and thus 
we see that the lower level sets of / are weakly sequentially compact, so we can apply 
Theorem 2.3 to obtain the following: 

Theorem 2.4 With f defined as in (2.5) and C = {x : Ax = 6} for A : L 1 (f2) — > JR N 
linear and continuous, the infimum 

p — inf {/(x) : Ax = 6} , (2.6) 

if finite , is attained for some Xo € L 1 (J1) such that Ax o = b. 

Note that if X = L°°, then the level sets L a of / are not bounded in the norm 
topology. Hence L a is not compact for any of the norm, weak or weak-* topologies, 
which is a consequence of the fact that a continuous function attains its maximum on a 
compact set, of the theory of dual pairsfll], and of the principle of uniform boundedness, 
respectively. Thus, although it is tempting to approach the image reconstruction problem 
in L x , the initial problem of existence is much more difficult. However we will see that 
L 1 -optimal solutions are actually -optimal solutions as well. We will also see why one 
might want to pose the problem in L x . 
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2.3 Uniqueness of Solutions 

In this section we show that solutions to (1.5) are unique. 

Definition 2.5 A function g : X — > (— oo, +oo] is convex if, for nil x, y E X and all 
A € (0, 1), we have 

g(Xx + (1 - A )y) < A g(x) 4- (1 - A)^(y). (2.7) 

Also, g is strictly convex if, whenever x ^ y and g{x) , g{y) € M, this inequality is strict. 
A set C Cl is convex if, whenever x,y 6 C and A G (0,1), then 

Xx + (1 — A )y G C. (2.8) 

Lemma 2.6 If f(x) = J <f>{x{u))du, then f is strictly convex if and only if <j> is strictly 
convex. 


Proof: Assume that / is strictly convex and that there exists au^v and a A 6 (0, 1) 

such that 

<f>(Xu + (1 — A)u) > A <f>(u) + (1 — A)0(u). (2.9) 

Then let x(t) = u and y(t) = v, so that 

/(Ax + (1 - A )y) > A/(x) + (1 - A )f(y), (2.10) 

contradicting the assumption that / is strictly convex. 

Conversely, let E = {t : x(t) ^ y{t)} and assume that m(E) > 0. Then 

/(Ax + (1 - A )y) = / <^>(Ax(t) + (1 - A )y(t)dt + [ ej>{x{t))dt (2.11) 

JE JE 

< J X4>(x{t)) + (1 — X)(f>(y(t))dt + J <p(x(t))dt (2.12) 
= A/(x) + (1 — A)/(y) (2.13) 

where the strict inequality is due to the strict convexity of <f>. ■ 

It is easy to check that ef>{u) as defined in (1.6) is strictly convex, thus / from (2.5) is 
as well. 

Theorem 2.7 If f is strictly convex and C is a convex set, then solutions to 

p = inf{/(x) : x G C} (2.14) 


are unique, provided they exist. 
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Proof: Suppose p = f(x) — f(y), where x^y G C. Since C is convex, then \x+\ y G C 
and 1 1 

f(x/2 + y/2) < -f(x) + -f(y) = p (2.15) 

contradicting the definition of p. ■ 

Putting together these results we have the following. 

Theorem 2.8 If £1 is a set of finite measure, then the solution to (1.5) exists and is 
unique. 


2.4 Characterizing the Solution 

In this section we characterize solutions to 


p = inf <f>(x(s, t))ds dt : J xtp™ — b ™ , 


m = 1, . . . , M, k=l,...,K r 




(2.16) 


where 


( u\nu — u 



0 

+oo 


it > 0 

u = 0 
u < 0. 


(2.17) 


This is the image reconstruction problem we introduced in Section 1. We have included 
a linear factor in the objective functional; however, if we assume that the projections 
cover the image, then this factor does not change the solution; it only simplifies certain 
formulae. A typical approach is to attach a Lagrange multiplier A to the constraints and 
differentiate the Lagrangian at the optimal xq (which we now know exists) to obtain 


xo(«,t) = W‘(^ T A(s,*)) =exp(A T A(M)), (2-18) 


where 


M K m 

A t : L~(Q) - R n via (>1 t A) (s,t) = £ Y.KWM- 

m=l Jfc=l 


(2.19) 


However, the functional f(x ) — J (f>(x(s, t))dsdt is not differentiable. Indeed, / = +oo on 

a dense subset of L T (Q) and is therefore not even continuous. It is for this reason that 
some people have chosen to work in C(Q) or L X (Q) [12, 13] where / is differentiable, but 
the question of existence and attainment is much more difficult there. 

A correct approach is to use Fenchel duality with a constraint qualification (CQ). 
While the classical CQ fails to apply in our example, in [6] a CQ is developed that does 
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apply to CT. Heuristically, we assume a multiplier A exists so that 


( 2 . 20 ) 

( 2 . 21 ) 

( 2 . 22 ) 


p = inf {/(x) + (b - Ax, A)} 

= (b, A) + inf {/(x) - (x, A 7 !)} 

= (6, A) -sup{(x,A T A) -/(x)}. 

For a convex function g : X — > (— oo, +oo] we define g* : X * — > (— oo, +oo] by 

g * ( V ) = sup {(x, y) - g(x)} . (2.23) 

X 

This is called the Fenchel conjugate of g at y. Using this definition, we see from (2.22) 

P=(b,\)-f(A T X). (2.24) 

Now, for any A, 

inf {/(x) + (6 — Ax, A)} = (6, A) — /* (A t A) < inf {/(x) : Ax — 6} = p. (2.25) 
Therefore, if A exists, then 

p = max { (6, A) - /* (A r A) } . (2.26) 

This is the Fenchel dual of (1.5). In our problem, it earn be shown [14] that for y £ L°°(Q) 

f*{y) = j <p m {y(s,t))dsdt, (2.27) 

that is, the conjugate of the integral functional / is given as an integral function of 
<f>*. From [6], if 3x € X 1 (f2) where x > 0 a.e., /(x) £ 2R and Ax = b (the constraint 
qualification), then a Lagrange multiplier A does exist. Also, if A solves (2.26), then the 
optimal xo(s, t) for (1.5) is 

x 0 (s, t) = (</>*)' (a t A(s, <)) . (2.28) 

In the case <p is of the form (2.17), it is easy to show that (j>*(v) = e v , and we get 

xo(s, t) = exp (A 7 A(s, tfj , (2.29) 

where A solves (2.26). This matches the heuristic derivation, but this is no accident since 
in fair generality, ( </>')“ 1 = (</>’)'. Thus, solving the image reconstruction problem, (1.5), is 
equivalent to solving the dual problem (2.26), which is an unconstrained finite-dimensional 
differentiable concave maximization problem. 
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Figure 2: The optimal grid. 

2.5 The Optimal Grid 

Rewriting the solution to the image reconstruction problem (2.29) in a more explicit form, 

we have / M K m \ 

x 0 (s, t) = exp f X) 5Z a jT V' fc* ( s > 0 ) « ( 2 ‘ 30 ) 

\m= 1 / 

see (2.19). It can be seen from the concavity of (2.26) that any function x of the form 

(2.30) that satisfies Ax = b must be optimal. 

Recall that ip™ is the characteristic function of the fc th strip emanating from the 
m th projection. Thus, the solution in (2.30) implies that the optimal function in L\Q) is 
piecewise constant on the grid obtained by intersecting all of the strips. This grid has been 
observed for physical reasons, [8], but here we have shown that the best (from maximum 
entropy considerations) function from L J (0) is this piecewise constant function. For this 
reason, we call this discretization of the optimal grid. 

A typical grid is shown in Figure 2. Here we have used 8 projections each divided into 
12 cells. The central point of the theory presented above is that the exact solution of the 
image reconstruction problem is piecewise constant on this grid. 
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3 Implementation 


3.1 Relaxation 


Now we develop a simple iterative procedure to solve the image problem by solving the 

dual problem (2.26) for the optimal A. Recall that to solve the dual problem we seek to 
maximize 

G(\) = (6, A) — j exp(A T \(s,t))dsdt. (3.1) 

This is a simple matter once one observes that G is concave and differentiable, so maxi- 
mizing G is just finding critical points. The derivative with respect to A is 

VG(X) = b- A j exp(A T \(s,t))dsdt. (3.2) 

This gives N equations for the N unknown A’s. 

An obvious iterative scheme is to cycle through the components, A*, of VG, correcting 
each A fc in turn so that the k th component of VG is zero, that is, so that G is maximal 
with respect to each individual variable. We choose A™ so that 


b? = (Aexp(A T \)) km = J exp(A T \(s,t))dsdt. (3.3) 

After some simplifications, a single step of this scheme can be written as 


ex P(A*0 


°k l 


h m 

°k f 


L;Mt:kk) l„ ir ton*” 


(3.4) 


where E' is the sum over all projections m and cells k except m' and k\ IT is similarly 
defined and pf = exp(A^). 

Now, because A T A is piecewise constant on the optimal grid, the integral of exp(A T A) 
along any strip is just a sum over each polygon in that strip of the area of the polygon 

times the product of the p’s that correspond to the particular cell from each projection 
that makes up the polygon: 



= E a«»(p) n vZ, , 

_ m£m ! 


(3.5) 


where the product is only over the cells k m in projection m for polygon p. 

The point of this discussion is twofold. First, we can see from (3.4) that we never 
need to exponentiate since we only need the p's. Second, the areas of the polygons can 

be precomputed, meaning that these integrals cam be calculated exactly; no numerical 
integration is needed. 
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Figure 3: A subset of the optimal grid. 


An important feature of this development is that there are no approximations or 
discretizations being made in the whole program, except for the initial discretization of 
the data into the vector b. We have characterized the L^Q) solution as a piecewise 
constant function on the optimal grid, and the necessary integrals can be performed 
exactly on this grid. For this reason, in our implementation we calculate the areas of each 
of the polygons in the optimal grid; in fact, to obtain Figure 2, we in fact plotted the 
polygons themselves, not just intersecting lines. To demonstrate this fact, in Figure 3 we 
plotted 1000 of the 2096 polygons from Figure 2. 

As can be observed in Figure 2, the number of polygons in the optimal grid can be very 
large compared to the number of projections and cells per projection. This number grows 
very quickly as a function of these two variables; for example, with 12 projections placed 
uniformly around the circle and 10 cells per projection, we generate 2904 polygons; with 
20 cells per projection we generate 12,561 polygons. While the optimal grid generation 
is a time and storage intensive procedure, given a fixed geometry we need only run this 
part of the program once. To reconstruct images using such large data sets, fast methods 
are essential. 

The iterative method described above tends to stall after a few iterations. This effect 
can be observed in the reported data from [8] . In Figure 4 the top two curves represent the 
rates of convergence using this scheme on a 3 projection, 4 cell per projection problem. 
Here we have graphed both the rate associated with the residual || Ax - 6||cc and the 
rate of convergence of the entropies computed as G(A new ). Since we give as initial data a 
known function of given A’s, we can compute the true entropy to which the iterates should 
be converging; this is a useful debugging tool since we can monitor both the residual and 
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Figure 4: Residual and entropy rates, with and without MG. 

the entropy. While convergent, note that the convergence is initially good, but the rate 
degrades rapidly to steady off at about 0.85. 

3.2 Multilevel Methods 

To improve the convergence rates observed, we have implemented a two-grid multilevel 
scheme with unigrid [15] corrections. In this section we describe our method and give 
some preliminary results. 

Coarsening is achieved by pairing adjacent cells in the projections and updating the 
associated /i’s with a single correction that makes their average residual zero. On the 
coarse grid, we iterate this relaxation process until the norm of the vector of average 
residuals is below a user supplied e, typically 0.05. All of the calculations are done in a 
unigrid fashion on the fine grid. This makes the process more expensive than necessary, 
but its performance is equivalent to the more efficient V-cycle multigrid scheme and it is 
much easier to implement and manipulate. 

Using the same geometry and data as before, but with relaxation accelerated by coarse 
grid corrections, we obtain the rates given in the lower two curves in Figure 4. Note that 
with only a two-grid scheme, we have reduced the convergence rate from about 0.85 to 
about 0.72. 

As a demonstration of the reconstructions we can obtain, we present Figures 5 and 6. 
Recall that the reconstruction is computed on the optimal grid, but for plotting purposes 
we essentially use a square grid. While there are several ways of translating from the 
optimal grid to a square grid, we have chosen simply to evaluate the optimal image 
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Figure 5: Original image. 

function as reconstructed from the optimal A via (2.30) at the lattice points of the square 
grid without performing interpolation. Improving this process could be a direction of 
future investigation. 

Figure 5 is the original image. Note that this is not a piecewise constant function 
on the optimal grid, so our reconstructions cannot be exact. In fact, the function of 
maximum entropy with the data generated by this function is not the original image, as 
shown previously, it is piecewise constant on the optimal grid, as are our reconstructions. 

To obtain Figure 6, we use a simple two-dimensional integrator, based on Simpson’s 
rule, on the original function to make b using 5 projections each with 8 cells. We then 
iterated our multilevel scheme to convergence (so that the i 2 norm of the average residuals 
was less than 10~ 4 ) and plotted the result in Figure 6 as discussed above. This process 
produces a set of A’s that we then used as our initial data in the routine to obtain a 
reconstruction of the first reconstruction. This next reconstruction is virtually identical 
to the first, as the theory predicts. 

As a final note, an extra benefit of this scheme is data compression. Given a data 
collection geometry, when we collect the N pieces of data, we need only solve the recon- 
struction problem once to get the N fi's. From these we can reconstruct the image to any 
level of resolution desired; indeed, we have shown that the piecewise constant function on 
the optimal grid is the most information one can extract from the data. 
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Figure 6: Reconstruction. 


4 Conclusions 

We have seen that the solution to the maximum entropy image reconstruction problem 
posed in L l (Q) is a piecewise constant function on the optimal grid. We have also seen 
that each iterate of the simple iterative scheme for solving the associated dual problem can 
be computed exactly; no numerical integration or approximations are needed. Finally, we 
observed that a unigrid scheme to accelerate convergence shows potential, though more 
testing and analysis is needed. 

As a final comment, we note that that the mathematics used to derive the optimal grid 
and the iterative scheme can be applied to other objective functionals / and other geome- 
tries, for example, minimum L 2 -norm, fan beam projections and non-symmetric placement 
of the projections. These issues and a more complete discussion of the mathematics in 
optimization techniques for image reconstruction from projections will be covered in a 
future paper. 
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